Skip to content

Conversation

@acwhite211
Copy link
Member

@acwhite211 acwhite211 commented Jan 12, 2026

Fixes #7551
Fixes #7617
Fixes #7626

This PR adds Django model definitions, constraints, and migrations for several legacy join tables and related entities that were present in existing Specify databases but missing from freshly initialized Sp7 databases.

These changes are based on a systematic comparison between an existing production database schema and a newly created schema, with the goal of ensuring that new databases accurately reflect the constraints and relationships relied on by legacy data and workflows.

Here is a link to an example of the difference between an Sp6 and Sp7 created database schema dump: https://www.diffchecker.com/qdMfXJCj/

After analyzing many different schema dumps, the schema differences between databases creation in Sp6 and Sp7, the following was found for constant differences:

16 Missing Foreign Key Constraints:

  • agent: FK (institutiontcid) -> institutionnetwork (institutionnetworkid)
  • collection: FK (institutionnetworkid) -> institutionnetwork (institutionnetworkid)
  • deaccessionpreparation: FK (createdbyagentid) -> agent (agentid)
  • deaccessionpreparation: FK (modifiedbyagentid) -> agent (agentid)
  • deaccessionpreparation: FK (deaccessionid) -> deaccession (deaccessionid)
  • deaccessionpreparation: FK (preparationid) -> preparation (preparationid)
  • project_colobj: FK (collectionobjectid) -> collectionobject (collectionobjectid)
  • project_colobj: FK (projectid) -> project (projectid)
  • sgrbatchmatchresultitem: FK (batchmatchresultsetid) -> sgrbatchmatchresultset (id) ON DELETE CASCADE
  • sgrbatchmatchresultset: FK (matchconfigurationid) -> sgrmatchconfiguration (id)
  • sp_schema_mapping: FK (spexportschemaid) -> spexportschema (spexportschemaid)
  • sp_schema_mapping: FK (spexportschemamappingid) -> spexportschemamapping (spexportschemamappingid)
  • specifyuser_spprincipal: FK (specifyuserid) -> specifyuser (specifyuserid)
  • specifyuser_spprincipal: FK (spprincipalid) -> spprincipal (spprincipalid)
  • spprincipal_sppermission: FK (sppermissionid) -> sppermission (sppermissionid)
  • spprincipal_sppermission: FK (spprincipalid) -> spprincipal (spprincipalid)

0 missing unique constraints were found, actually Sp7 create database had a few extra unique constrains compared to Sp6 created databases.

32 Missing Primary Key Constraints (Mostly due to unused tables not used in Sp7):
Missing / changed PRIMARY KEYs: 35

  • autonumsch_coll: PRIMARY KEY (collectionid, autonumberingschemeid)
  • autonumsch_div: PRIMARY KEY (divisionid, autonumberingschemeid)
  • autonumsch_dsp: PRIMARY KEY (disciplineid, autonumberingschemeid)
  • countryinfo: PRIMARY KEY (name)
  • deaccessionpreparation: PRIMARY KEY (deaccessionpreparationid)
  • dwcfish: PRIMARY KEY (dwcfishid)
  • dwcfishtissue: PRIMARY KEY (dwcfishtissueid)
  • dwckui: PRIMARY KEY (dwckuiid)
  • dwckuit: PRIMARY KEY (dwckuitid)
  • fishportalmapping: PRIMARY KEY (fishportalmappingid)
  • fwriportalmapping: PRIMARY KEY (fwriportalmappingid)
  • geoname: PRIMARY KEY (geonameid)
  • ios_colobjagents: PRIMARY KEY (oldid)
  • ios_colobjbio: PRIMARY KEY (oldid)
  • ios_colobjchron: PRIMARY KEY (oldid)
  • ios_colobjcnts: PRIMARY KEY (oldid)
  • ios_colobjgeo: PRIMARY KEY (oldid)
  • ios_colobjlitho: PRIMARY KEY (oldid)
  • ios_geogeo_cnt: PRIMARY KEY (oldid)
  • ios_geogeo_cty: PRIMARY KEY (oldid)
  • ios_geoloc: PRIMARY KEY (oldid)
  • ios_geoloc_cnt: PRIMARY KEY (oldid)
  • ios_geoloc_cty: PRIMARY KEY (oldid)
  • ios_taxon_pid: PRIMARY KEY (oldid)
  • project_colobj: PRIMARY KEY (projectid, collectionobjectid)
  • sgrbatchmatchresultitem: PRIMARY KEY (id)
  • sgrbatchmatchresultset: PRIMARY KEY (id)
  • sgrmatchconfiguration: PRIMARY KEY (id)
  • sp_schema_mapping: PRIMARY KEY (spexportschemamappingid, spexportschemaid)
  • specifyuser_spprincipal: PRIMARY KEY (specifyuserid, spprincipalid)
  • spprincipal_sppermission: PRIMARY KEY (sppermissionid, spprincipalid)
  • spstynthy: PRIMARY KEY (spstynthyid)
  • taxa2id: PRIMARY KEY (idtaxa2id)
  • tissue_web_search: PRIMARY KEY (tissue_web_searchid)
  • voucher_web_search: PRIMARY KEY (voucher_web_searchid)

The following tables were identified in the schema diff but were not added in this PR because they are legacy, client specific, or no longer used by current Specify workflows:

  • countryinfo
  • geoname
  • dwcfish
  • dwcfishtissue
  • dwckui
  • dwckuit
  • fishportalmapping
  • fwriportalmapping
  • spstynthy
  • tissue_web_search
  • voucher_web_search
  • ios_colobj*
  • ios_geogeo*
  • ios_geoloc*
  • ios_taxon_pid

These tables appear to not be required for new database creation or normal application operation, but let me know if any of these should be added.

Checklist

  • Self-review the PR after opening it to make sure the changes look good and
    self-explanatory (or properly documented)
  • Add relevant issue to release milestone
  • Add pr to documentation list
  • Add automated tests
  • Add a reverse migration if a migration is present in the PR

Testing instructions

  • Follow the process to create a new database in Specify 7, see that it completes without errors.
  • Check the schema dump of the newly created database to see that it contains all the new schema created in this PR. mariadb-dump -uroot -proot --no-data db_name > dbname_schema.sql

@acwhite211
Copy link
Member Author

I've added fixes for all of the unit tests that were failing in main. Currently, there are some unit tests failing specifically because of the new tables added from Specify 6 that have multi-field primary keys. This is causing issues with our django, datamodel, and sqlalchemy code. Working through different options for workarounds.

@acwhite211 acwhite211 marked this pull request as ready for review January 27, 2026 14:38
@acwhite211
Copy link
Member Author

Some of the new tables that need to be added from sp6 have multi-field primary keys, which is causing a lot of issues for our datamodel to sqlalchemy model generation.  I create a solution where these will be present in the Django model, but skip the creation for the sqlalchemy model so we can go ahead and finish this issue.  We can worry about the sqlalchemy version of the model another time.

@acwhite211
Copy link
Member Author

Got the many-to-many fields working in the Django model, and got the unit tests working. Ready for review.

@acwhite211
Copy link
Member Author

Looks good, thanks for removing those unused Specify 6 tables 👍

@melton-jason
Copy link
Contributor

Hey everyone! While I was testing this PR, I ran into an issue with databases created from the Specify 6. Specifically, the issue was caused due to Django expecting and using the Primary Key from the model when using the relationships, which does not exist in the many to many tables from Specify 6.

To fix this, I moved the many to many code to its own migration outside of the initial migration and handled migrating the tables from Specify 6 to the new consistent format.

See the changes in a788c8c..43bc97d.

Some additional information about the migration is available in the migration itself:

This migration creates or normalizes the Many to Many Join Tables for all
instances (those created in Specify 6, and those created in Specify 7).
The migration is necessitated due to the fact that the Django version we're
using when the tables are being created does not support more than one Primary
Key per table.
This is problematic becuase Specify 6 Many to Many Join tables all had more
than one Primary Key.
The forwards migration steps are the following:
- Store existing Many To Many records in an intermediary source (Redis)
- Drop the old Many to Many tables
- Recreate the Many to Many tables via Django
- Migrate the stored records to the new Many to Many tables

To facilitate the migration, we utilize Redis to store the old values of the Many to Many tables during the time the older tables are torn down and reconstructed. The stored records are then later populated into the new tables.
See the Warning message from the migration:

WARNING: Data loss may occur if the Redis container is stopped once this
migration has been stared but has not finished.
For example, if an error occurs during this migration and the Redis container
is stopped.
Please ensure data within Redis is persisted via a mount, or there are
additional backups of the following tables before this migration:
- autonumsch_coll
- autonumsch_dsp
- autonumsch_div
- specifyuser_spprincipal
- spprincipal_sppermission
- sp_schema_mapping
- project_colobj

Another caveat with this storage procedure is its memory usage.
We could consider using a (temporary) file as an alternative to Redis if this is a concern.

('createdbyagent', models.ForeignKey(db_column='CreatedByAgentID', null=True, on_delete=specifyweb.specify.models.protect_with_blockers, related_name='+', to='specify.agent')),
('modifiedbyagent', models.ForeignKey(db_column='ModifiedByAgentID', null=True, on_delete=specifyweb.specify.models.protect_with_blockers, related_name='+', to='specify.agent')),
('parent', models.ForeignKey(db_column='ParentItemID', null=True, on_delete=django.db.models.deletion.DO_NOTHING, related_name='children', to='specify.taxontreedefitem')),
('parent', models.ForeignKey(db_column='ParentItemID', null=True, on_delete=django.db.models.deletion.CASCADE, related_name='children', to='specify.taxontreedefitem')),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you know what prompted the on delete behavior change of TaxonTreeDefItem -> parent from 76cec14?

Was it some comment about delete blockers from #7647, #6671, or #7674?

I imagine if this a necessary change we'll have to apply the same change to other trees.
Maybe we can look into the delete and delete blocker behavior for TreeDefItem tables?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 📋Back Log

2 participants